Chapter 5

Durk Jan de Bruin

Is It Legal?

How should programs respond to illegal data? Assuming that data will be legal asks for trouble: What if users mistype? What if users want to confuse the program? This case study describes how to guard against illegal data in the Check That Number!, Banners With CLASS, Calendar Shop, and Roman Calculator Construction programs.

Problem statement

The problem statements for Check That Number!, Banners With CLASS, The Calendar Shop, and Roman Calculator Construction all make assumptions about the nature of the input. Such assumptions are rarely justified. What if users mistype? What if the user tries to confuse the program? A better approach is to make sure that only legal input is processed. This case study explores general ways to check input for errors as well as specific solutions for the programs encountered up to now.

Modify the programs for Check That Number!, Banners With CLASS, The Calendar Shop, and Roman Calculator Construction so that they handle invalid input appropriately. They should prompt for input, read a value or values from the user without crashing, analyze the values, and repeat this process until the user provides legal input.

Recall the legal input for the four earlier case studies:

Check That Number!	a line containing only a four-digit identification number
Banners With CLASS	a line containing a legal letter size (an integer between 5 and 24, inclusive), possibly preceded by blanks; a line containing exactly six characters, all chosen from P, Y, T, H, O and N
The Calendar Shop	a line containing a legal year, possibly preceded by blanks
Roman Calculator Construction	a line containing a legal Roman numeral

Analysis

5.1 Why was it initially reasonable to assume that the data for the Check That Number! and Roman Calculator Construction programs would be legal? Why would this assumption not be appropriate for the Banners With CLASS or Calendar Shop programs?

Reflection

5.2 How susceptible are you to making typing errors? If you are an inaccurate typist, how do you catch errors?

Reflection

5.3 Which programs, when you run them, have left you unsure about what to type at a given point? What improvements that would help might be incorporated in these programs?

Testing

5.4 Think of ways to crash the programs produced in earlier case studies. Catalog the error messages or crash situations that result.

Chapter 5

Durk Jan de Bruin

Preparation

Solutions in this case study use one-dimensional arrays, loops, and functions.

Checking for Legal Input

What changes are needed to guard against illegal data?

The problem statement outlines several changes that must be made to the programs in earlier case studies to handle invalid input. First, each program is to print a prompt before reading, to alert the user to what kind of input is requested. Second, it must read the input without crashing, and analyze it. Third, the program must reprompt as long as the input is illegal.

Stop & Predict

What parts of the input-handling code will be common to all the programs?

Thus all the programs will need code that prompts for input; only the contents of the prompt will differ between programs. All the programs will contain some sort of loop that continues to prompt, read, and analyze the input until it's legal. The analysis that must be performed on the input will probably differ from program to program. We will start by concentrating on the code that is similar among all four programs, and postpone the program-specific code.

Stop & Predict

Did any of the programs from earlier case studies involve a check for legal data?

The Calendar Shop program already includes a check that its input is legal. Here is its input loop:

while not Legal(year):
print('Please type the year you want a
calendar for: ')
year = int(input())
if not Legal(year):
print('Can\'t make a calendar for that year.')

How can the code from The Calendar Shop be recycled?

The code in The Calendar Shop suggests a framework for the loop that reads legal values from the user:

repeat
prompt for input
read a set of input values
if the values aren't legal then print an error message
until the values are legal

Stop & Consider

Why is while more appropriate for the above loop than for?

What template is represented in this decomposition?

This is another example of the "input one, process one" template:
Input: Prompt for input, then read a set of input values.
Process: Check for legality, and print an error message if necessary.

What type of variable should be used for input?

Trying to read a non-numeric value into an integer or float variable will cause an error. However, any input can be read into a character variable without error. Thus the input step should read the values as characters and convert them to a numeric representation if necessary. (The template for converting characters to numeric values was introduced in Check That Number!)

Stop & Predict

How should the character variables be stored?

To read a numeric value in this way, each digit will have to be stored as a character. This argues for the use of an array of characters to store the input. Use of an array will allow values that are too long or too short to be easily detected.

How much should be read at a time?

In all these programs the input is on a single line, so reading one line at a time and checking if the information is legal will work. In particular, Check That Number! and The Calendar Shop each expect one numeric value on the line. Banners With CLASS expects a numeric value on one line and six letters on the next, and Roman Calculator Construction expects one string of characters.

What definitions and declarations are used?

To read values into an array we use the "fill an array" template introduced in the Roman Calculator Construction program.

Using a constant to specify the maximum line length for convenience in adapting to the four different applications yields the following definitions and declarations:

LINELENGTH = some positive integer
LineType = ''
line = LineType

How Is a line read?

We can recycle code to read values into an array from the entire-numeral version of the Roman Calculator Construction program:

length = 0
ch = input()
length = len(ch)

Chapter 5

Durk Jan de Bruin

Stop & Predict

What can go wrong with array input?

What if the line is longer than the array?

The code will crash if enough input is provided to overflow the array. We can guard against this error with a test:

length = 0
ch = input()
length = len(ch)
if length <= LINELENGTH:
line = ch

Stop & Consider

What would be wrong with code that assigned a value to length only? length was smaller than LINELENGTH, as follows?

What is the right length for LINELENGTH?

To detect a value that is too long, LINELENGTH should be one more than the length of the longest legal line. This will allow us to store any legal line and to detect lines that have too many characters.

How does detecting legal values fit into a function?

Just as we used the function ReadRoman in Roman Calculator Construction, we package the input code here into a function ReadLine:

def ReadLine():
global length
global line
ch = ''
ch = input()
length = len(ch)
if length <= LINELENGTH:
line = ch

What happens if the input is illegal?

If the input is illegal the program should print an error message. We decide to test legality in a boolean function called IsLegal, similar to that used in The Calendar Shop program, and update the main loop as follows:

while not IsLegal(line, length):
prompt for input
ReadLine()
if not IsLegal(line, length):
print an error message

Testing

5.5 What error message does your Python environment produce if, when running a program, you type a letter when an integer is expected?

Testing

5.6 Does the text editor on your system impose a limit on the length of a line? If so, how does it handle a line that's too long?

Modification

5.7 The input-handling loop includes two calls to IsLegal, where only one is necessary. Modify the code to save the result of the first call to IsLegal in a boolean variable, and then to check the saved result when determining whether or not to continue looping.

Analysis

5.8 What is the advantage of using a boolean function in the input loop?

Modifying the Check That Number! Program

What are illegal inputs to Check That Number?

The program for Check That Number! has an input of exactly four digits and nothing else. The IsLegal function can thus check length, then check each of the four characters to make sure they are digits.

def IsLegal():
global line
global length
k = 0
IsLegal = True
if length != 4:
IsLegal = False
else:
for k in range(4):
if not IsDigit(line[k]):
IsLegal = false
return IsLegal

What template is used for checking the digits?

Checking for digits is an instance of the template "process every element of an array." Here the "processing" is making sure it's a digit. Making IsDigit a function with an informative name communicates that it performs a well-defined activity and makes the code clearer as well.

Stop & Predict

Write the IsDigit function. Recycle code from Check That Number!

What are good test cases?

Testing for illegal data requires a comprehensive set of illegal data, best generated using the black-box approach. For this program we start with typical cases with four digits to make sure that the program still works. Then we try extreme cases.

Chapter 5

Durk Jan de Bruin

To find the extremes, it helps to consider numeric quantities associated with the input, and to choose data for which these quantities are as small and as large as possible. One such quantity is the length of the input. Thus the extremes are an input line that's as long as possible and another that's as short as possible. Since we're worried about illegal data, we must also check lines that are 1 character too long and 1 character too short, with incorrect numbers of digits. These are cases with 0, 3, and 5 digits.

Two other quantities associated with the input are the position in the line at which an error can occur, and the number of digits in the line. We need test cases with the position as small and as large as possible, i.e. with nondigits at the front, at the back, and as the only input. We should also test lines containing all digits (that's the typical case) and with no digits.

The complete Check That Number! program, with error checking, appears in the Python Code section.

Stop & Help

Devise test data in the categories just described, and test the program.

Analysis

5.9 Why should the length check precede the digit-checking loop?

Debugging

5.10 What is wrong with the following code for the digit-checking loop?

for k in range(4):
Legal = Digit(line[k])

Explain the circumstances under which it would cause a different answer to be returned.

Modification

5.11 On many Python systems, it is possible to signal end-of-input from the keyboard. The program just designed will crash or enter an infinite loop if this is done. Using a Python's function, modify the program to detect and handle end-of-input signaled by the user.

Modification

5.12 The code that checks that if characters in the line are digits need not check all the characters, since it is an error if any nondigits appear in the line. Recode the loop as a while or for loop that stops executing when a nondigit is encountered in the line.

Modification

5.13 Modify the code to print, for each nondigit in the line, a message informing the user of the position of the nondigit.

Modifying Banners With CLASS and The Calendar Shop

What are illegal inputs to Banners With CLASS and The Calendar Shop?

The Banners With CLASS program has two inputs. One is a "bounded" integer-an integer bounded by specified values. The other is a set of six letters. We'll start with the latter.

How can the letters in Banners With CLASS be checked?

One line of input to Banners With CLASS should contain exactly six letters chosen from the set 'P', 'Y', 'T', 'H', 'O' and 'N'. Checking this line is just like checking the input to Check That Number!

Stop & Help

Write a boolean function called IsLegalLine that checks a line of letters to see that there are exactly six characters on the line and that they all are either 'P', 'Y', 'T', 'H', 'O' or 'N'.

The CalendarShop program also reads a bounded integer. A general subprogram that returns a legal integer will thus be useful in both programs and probably in others as well. Its code can be altered if necessary to check that the integer is in the valid range. We will design a function called ReadlnInteger-consistent with Python's input-that returns a legal integer value in its single parameter.

How can bounded integers be checked?

To be legal, a bounded integer needs to be composed of digits and in the specified range. To be consistent with integer input in Python, the program should ignore blanks that appear before the integer.

What are alternatives for decomposing the input and testing of an integer?

Alternatives for reading and testing integers include the following.

Ignoring the leading blanks in the line:
read a line, ignoring leading blanks;
check that it consists entirely of digits;
if so, then
convert it to an integer value;
check if the integer value is in range;

Storing the leading blanks in the line:
read a line;
check that it consists entirely of blanks followed by digits;
if so, then

Chapter 5

Durk Jan de Bruin

convert it to an integer value;
check if the integer value is in range;

Storing, then removing leading blanks in the line:
read a line;
remove the leading blanks;
check that it consists entirely of digits;
if so, then
convert it to an integer value;
check if the integer value is in range;

Since there are several ways for an input not to represent an integer, each of the alternatives requires that extra code be added to the error-checking loop designed in the previous section.

Stop & Predict

Which of the three alternatives seems best? Why?

Stop & Predict

In which alternative can the user provide a sequence of blanks followed by a sequence of digits that results in the program not returning the numeric value? How can this happen?

Which decomposition should be chosen?

The easiest alternative to implement is the "storing and removing blanks" decomposition because each step is an easy array operation. To use an array, however, we must know its length in advance. No matter what length we choose, an obnoxious or obtuse user can provide more blanks than the array can hold. If the line array is not long enough to store all the blanks, the program may crash, or at least not be able to store all the digits. It seems better to ignore the leading blanks, so we choose the first decomposition above.

How are the nonblank characters in a line read?

The input step should skip leading blanks and read the remaining characters into an array. Leading blanks are easily detected. However, there are two ways for the blank-skipping loop to terminate: either a nonblank is read or the end of the line is encountered.

Stop & Predict

How can tests for blanks and for end-of-line be combined?

How can a while loop be used to test for blanks and end-of-line?

The problem with using a while loop to skip leading blanks is that eoln must be tested before input is done, whereas blanks can only be detected after a character is read. Combining these tests will be complicated and hard to understand. To test for both eoln and for blanks, we instead use a boolean variable done along with a while not done loop.

Within the loop, the various conditions for termination are tested in sequence, and the done variable is set appropriately. We alter the ReadLine code from the previous section as follows:

done = False
while not done:
if ch == BLANK:
done = True
else:
ch = input()
if ch != BLANK:
done = True

A constant called BLANK, already used in The Calendar Shop program, is a more descriptive way to represent a blank character than an unnamed quoted blank.

Stop & Consider

Explain why the above loop is sure to terminate.

Stop & Predict

When the loop terminates, what do you know about the contents of ch?

How is the rest of the line read into the line array?

To finish designing code that reads the line, we must analyze its postcondition, the condition that is true at the end of the loop just written. When the blank-skipping loop terminates, either ch contains the first nonblank character on the line, or eoln is true.

Stop & Predict

Can the array-filling code come right after the blank-skipping code?

Appending the array-filling code immediately after the loop would give the following result:

global length
global line
ch = ''
ch = input()
length = len(ch)
if length <= LINELENGTH:
line = ch

What must be done to the array-filling code to make it usable?

The preconditions of the array-filling code-the conditions that must be true immediately before the code for it to work correctly-do not match the postconditions of the blank-skipping code.

Postcondition of blank-skipping code: eoln or (ch != BLANK) Precondition of input-and-store loop: No characters have yet been read from the line.

Chapter 5

Durk Jan de Bruin

This loop would then be the same as the loop from ReadLine, except that it would fill the second and following characters of line rather than the first and following characters.

This code works only if ch contains a nonblank. So now the problem is to determine if ch contains a nonblank. Can we test it? No, since it wasn't initialized. Can we check eoln instead? No, because it may happen that eoln is true and ch contains a nonblank.

Stop & Help

Under what conditions will eoln be true and ch contain a non blank?

To determine whether to add elements to line, we could modify the blank-skipping code in two ways, either by initializing ch to a blank or by setting length to 1 when a nonblank is first encountered. Code that skips blanks will probably be useful in the future, so we initialize ch. References to a variable-length-that really has nothing to do with the blank-skipping would clutter the code and make it more difficult to recycle in other programs.

How is line checked to consist only of digits?

Once line is filled, it must be checked to make sure that its characters are all digits. We did this already for the Check That Number! program, and we recycle that code:

for k in range(length):
if not IsDigit(line[k]):
IsLegal = False

How are the digits in line converted to an integer?

Converting the digits in line to a single integer value involves extracting the numeric digit values and computing the integer. First we convert characters to integers, as done in the Check That Number! program.

Stop & Help

Using the code from Check That Number! as a model, write a function called DigitValue that returns the numeric value of a character digit.

How are digit values combined to get an integer?

Combining the various digit values into an integer value requires an accumulation loop similar to that used in Roman Calculator Construction. The accumulation loop will use a sum variable and add each digit value in turn. Each time through the loop, one more digit will be "attached" to the accumulated value.

We consider an example, 5193. These characters will be stored in the line array as follows:

We want a loop that successively accumulates the first character in line, then the second character, and so on into an integer value, as shown in the following sequence of values:

Number of times through loop	Digit to accumulate	Accumulation value
1	5	5
2	1	51
3	9	519
4	3	5193

As noted in Check That Number! digit in a decimal numeral represents a power of 10. Thus, to get from one accumulation value to the next, the first value should be multiplied by 10, then the new digit value should be added. Here's the code:

sum = 0
for k in range(length):
sum = sum * 10 + DigitValue (line[k])

Stop & Help

Code the conversion of digits to an integer as a function called IntegerValue.

How are the various code segments combined?

All steps of the decomposition of the process of reading and error checking an integer have now been coded. The only remaining design decision concerns the arrangement of the code into functions.

functions should be easy to reuse and should help make the code readable. Thus, each subprogram should perform one easily described action and should not perform actions that would inhibit use in another context.

These criteria suggest four functions: ReadTrimmedLine, IsAlldigits, IntegerValue, and IsInRange. Since there are several ways for the input to be illegal, we code the body of the read-and-error-check loop in similar fashion to the blank-skipping loop, using a boolean variable called error.

def ReadInteger(n):
error = False
done = False
global line
global length
while not done:
error = False
print('Type a size for the letters between
5 and 24 : ', end = '')
ReadTrimmedLine()
if (length == 0) or (length > LINELENGTH):
error = True
elif not IsAllDigits(line):
error = True
else:
n = IntegerValue()
if not IsInRange(n):
error = True
else:
done = True
if error:
print('You must type an integer between
5 and 24.')
return n

At the end of this loop, the parameter n contains the integer value typed by the user.

Chapter 5

Durk Jan de Bruin

Stop & Help

Add the prompt message, the in-range check, and the error message appropriate for the Banners With CLASS program.

Stop & Help

Add the prompt message, the in-range check, and the error message appropriate for the Calendar Shop program

How is the code summarized?

The various error checks focus increasingly specifically on the input. Their narrowing action is represented by the diagram below. A check is made to see if there are any nonblanks in the input, then another to see if the nonblanks represent a number, then (after conversion) a third to make sure the number is in the specified range.

The Python Code section displays the use of the error-checking code along with arrays in the Banners With CLASS program. Error checking may be similarly included in the Calendar Shop program.

How should the code be tested and debugged?

As in previous programs, we test and debug the code a piece at a time. We start with a main program that repeatedly calls ReadlnInteger and prints the result:

while True:
ReadlnInteger (n)
print('value read is ' + n)

Alternatively, we could use a for loop to call ReadlnInteger a specific number of times, or use a test for a specific input value as an indication to stop.

ReadInteger calls four functions, ReadTrimmedLine, IsAllDigits, IntegerValue, and IsInRange. IsAllDigits and IsInRange are short, and the code for IsAllDigits was tested in the previous section, so we decide not to test them individually. That leaves ReadTrimmedLine and IntegerValue.

What code should be tested first?

One might test these two functions individually in a variety of ways. We choose to test ReadTrimmedLine by calling it from ReadInteger, to code IntegerValue initially as a stub that always returns 0, and to code InRange always to return true. For debugging output, we add a statement that prints the line and the length after returning from ReadTrimmedLine.

What test values should be used?

As before, we consider numeric quantities associated with the input to generate extreme cases. The number of leading blanks and the number of nonblanks in the line are two such quantities. Thus we test ReadTrimmedLine on lines with 0, 1, and 2 leading blanks, in combination with lines containing 0 nonblanks, 1 nonblank, the maximum allowed number of nonblanks, and 1 more than the maximum allowed number of nonblanks.

Stop & Help

Generate a set of test data for ReadTrimmedLine.

How should the remaining code be tested?

Once ReadTrimmedLine is tested and debugged, we can replace the stubs for IntegerValue and InRange. Test data for these routines should involve input with one digit, input with a maximum number of digits, and input at both sides of the boundaries of the legal range of values.

Stop & Help

Identify typical and boundary cases for testing the letter size input to the Banners With CLASS program. Use them to test the program in the Python Code section.

Stop & Help

Identify typical and boundary cases for the Calendar Shop program.

Testing

5.14 Run some experiments to find out if your Python environment limits the number of blanks that may precede integer input; if it does limits the number of leading blanks, determine the maximum number it allows.

Analysis

5.15 One might argue that the loop in IsAllDigits should stop when it detects a character in line that's not a digit, and that therefore the loop should be coded as a while or repeat. What do you think? Justify your choice.

Analysis

5.16 The IntegerValue function does not guard against constructing a value greater than maxint. What other part of the program can be used to ensure that no attempt is made to construct an integer that exceeds maxint?

Modification

5.17 Fix IntegerValue to return maxint if the contents of line represents a larger value.

Chapter 5

Durk Jan de Bruin

Modification

5.18 Reorganize the code to check, before calling IntegerValue, that the characters of line represent a value at most maxint.

Application

5.19 Describe how blank-skipping code could be used in a program that identified words in text.

One Way to Modify Roman Calculator Construction

What should be the first step of the modified ReadRoman?

Error checking the Roman Calculator Construction program involves modifying its ReadRoman function to return only legal Roman numerals. As with the other programs, this should start with a call to ReadLine to read the Roman numeral into a character array.

Stop & Help

How long should the line array be? That is, what is the longest Roman numeral whose value is less than 4000?

What are illegal Roman numerals?

Illegal Roman numerals are those that violate the rules for Roman numerals. Recall the description of a legal Roman numeral:

All its characters are Roman digits (M, D, C, L, X, V, I).
No more than three occurrences of M, C, X, or I may appear consecutively, and no more than one D, L, or V may appear at all.
C may prefix M or D; the digits following the prefixed digit represent a value no more than 99. X may prefix C or L but not both; the following digits represent a value no more than 9. 1 may prefix X or V; the prefixed digit must end the numeral.
Roman digits are written in nonascending order (except for prefixes).

What are the alternatives for testing the legality of a Roman numeral?

Two possible ways of approaching this problem might be called the rule violation approach and the rule-confirmation approach. We can test the legality of a string either by making sure it has all the characteristics of a Roman numeral or by describing each step of its construction using rules of building Roman numerals. With the rule-violation approach, we list a number of characteristics of legal Roman numerals and check to see if any of them are violated. With the rule-confirmation approach, we try to build a legal Roman numeral with the same digits as the one in question; if we succeed, then the Roman numeral is legal. The rule-violation approach is appropriate when the set of characteristics to be checked covers all possible cases. The rule-confirmation approach is appropriate when the rules for construction are well specified.

The rule-violation approach starts with the conditions listed above. They describe a legal Roman numeral, but they don't tell us how to build one.

If the conditions are complete, however-and we think they are-any sequence of Roman digits that satisfies them all is legal.

How is the rule-violation version of IsRoman designed?

The structure of the checking function, which we'll call IsRoman, is merely a sequence of tests:

if not IsLegalRomanDigits(line, length):
IsRoman = False
elif not IsLegalCounts(line, length):
IsRoman = False
elif not IsLegalPrefixes(line, length):
IsRoman = False
elif not LegalOrder(line, length):
IsRoman = False
else:
IsRoman = True

Note that all the functions return a positive result. We code them all the same for consistency; we return positive results because people find them easier to understand than functions that return negative results. The code is clearer as a result.

Stop & Consider

How would IsRoman be coded as a single assignment statement whose right side is a boolean expression that uses the and operator? What would be a potential problem with coding it in this way?

How is IsLegalRomanDigits coded?

The IsLegalRomanDigits function merely checks to see if all the characters in the line are 'I', 'V, 'X', 'L', 'C', 'D', or 'M'. This is almost the same as checking that all the characters in the line are 'P', 'Y', 'T', 'H', 'O' or 'N'. We can copy the code from the revised Banners With CLASS program and modify it appropriately.

Stop & Help

Write IsLegalRomanDigits.

The remaining checks look more complicated. There are two sets of rules for Roman digits, one for I, X, C, and M, another for V, L, and D. Even counting the occurrences of a Roman digit is not straightforward; for example, both XXXIX and XXXX contain four X's, but only one of them is legal. A numeral containing X as a prefix can contain only one other X. The other rules about prefixes complicate matters still further.

Stop & Consider

List the Roman numerals up to 49 and describe the rules governing legal placement of X.

How can the prefix tests be simplified?

In earlier case studies, we applied the principle of reducing the problem to problems that are easier to solve. Here, we apply this principle to the analysis: we convert the line to something that's easier to analyze. One approach is to replace each pair consisting of a prefix and the prefixed digit by a single new character. The rule for I, X, C, and M in the new representation is simpler; each can occur at most three times, and occurrences must be consecutive.

Chapter 5

Durk Jan de Bruin

The following code results:

if not IsLegalRomanDigits (line, length):
IsRoman = False
else:
ReplacePrefixes (line, length)
if not IsLegalCounts (line, length):
IsRoman = False
else if not IsLegalOrder (line, length):
IsRoman = False
else:
IsRoman = True

Will it work to substitute single digits for prefix combinations?

Before starting on the design of ReplacePrefixes, we need to make sure that this idea will work. Thus we move to IsLegalCounts and IsLegalOrder, assuming tentatively that the digit pairs IV, IX, XL, XC, CD, and CM have been replaced by the characters '1', '2', '3', '4', '5', and '6' respectively. (Any six characters that aren't Roman digits would be appropriate.)

How do the new digits behave compared with the original Roman digits?

The replacement characters act like V, L, and D in that each can appear only once. They differ from V, L, and D in that the digits that may follow them are constrained. For instance, VI is legal, but 1I and 21 (the transformed versions of IVI and IXI) aren't. Moreover, pairs of replacement characters are also limited. The numbers 1 and 2 can't occur together; nor can 3 and 4 or 5 and 6. We summarize the conditions on all the digits in Table 5.1. Where multiple occurrences of a digit are allowed, they must be consecutive.

How should the error checking functions be separated?

Ideally, we can split error checking into a number of small steps that correspond either to templates or to short, easily designed code segments. The table suggests one such step. IsLegalCount can be coded to count the occurrences of each digit, but shouldn't worry about whether or not multiple occurrences are consecutive. Nonconsecutive occurrences will be caught by the order checking; for instance, MMXM will be flagged as illegal because an M may not follow an X.

How can an array be used to see if two digits are correctly ordered?

Table 5.1 also suggests a way to use arrays to check that digits are correctly ordered. The first column forms an array in which the original Roman digits precede every Roman digit they are allowed to follow. Thus if we look up two consecutive digits in the array

and the second digit occurs earlier in the array than the first, the two digits are in correct order in the Roman numeral.

The next step is to figure out how the new digits fit into this scheme. First, we notice that for the purposes of checking for legality, the digits representing IV and IX act exactly alike, and only one can occur in a given Roman numeral. Thus we only need one digit to represent both IV and IX. The same holds for XL and XC and for CD and CM. That means that only three new digits are needed, not six, which should simplify the code somewhat.

Stop & Help

Verify that XL and CL cannot both occur in the same Roman numeral.

Stop & Help

Rewrite Table 5.1 using the character 'V to represent IV and IX, the character '2' to represent XL and XC, and the character '3' to represent CD and CM.

Second, we observe that placing the new digits in the array as shown below allows them to be checked in almost the same way as the original Roman digits.

Stop & Help

Can V appear in a Roman numeral that has IX or IV in it?

The exception is that each new digit can precede only those digits that occur two or more places before it in the array. The algorithm for checking the original Roman digits works correctly with the new digits added. We identify two conditions to apply to each pair of digits in the Roman numeral: either the first digit is one of the original Roman digits, or it's one of the prefix pairs represented by a 1, 2, or 3.

Chapter 5

Durk Jan de Bruin

How is ReplacePrefixes coded?

Satisfied that the scheme of replacing pairs that contain a prefix and the prefixed digit with single characters will yield a reasonable design, we return to the ReplacePrefixes function. ReplacePrefixes consists merely of looking at every character in the line, checking if it is a prefix, and if so replacing the pair of characters by the appropriate single character.

What template is applicable to the problem of replacing prefixes?

The template "process every element of an array" would seem to be applicable here. The "process" is to check if the element is a prefix and to replace it and its successor element if so. The template suggests the following code:

for k in range(length):
if line[K] is a prefix then replace line[k]
and line[k+1] by a new single digit

Stop & Help

Why does the loop variable not go from 1 to length?

Stop & Predict

The code has a flaw. What is it?

How must the code suggested by the template be modified?

A problem with the code is that replacing a pair of Roman digits with a single character reduces the length of line. As written, the code checks characters past the end of the transformed Roman numeral, since changes to length within the loop do not affect the number of times the loop is executed. One can either code the replacing function carefully so that characters past the end of the numeral cause no harm, or code the loop in such a way that the problem does not arise. We choose the latter approach and code a while loop that simulates the for-loop code.

k = 1
while k < length: # length will be evaluated at every iteration
if line[k] is a prefix then replace line[k]
and line[k + 1] by a new single digit
k = k + 1

How are prefixes detected?

There are three possibilities for a possible prefix, I, X, and C; the "select among alternatives" template suggests coding the body of the loop as if statement.

if line.find("IV") != -1:
line = line.replace("IV", "1")
length = length - 1
if line.find("IX") != -1:
line = line.replace("IX", "1")
length = length - 1
if line.find("XL") != -1:
line = line.replace("XL", "2")
length = length - 1
if line.find("XC") != -1:
line = line.replace("XC", "2")
length = length - 1
if line.find("CD") != -1:
line = line.replace("CD", "3")
length = length - 1
if line.find("CM") != -1:
line = line.replace("CM", "3")
length = length - 1

How are prefixes replaced?

The code uses the name Replace for the digit replacement function. This function will have to receive the line, its length, the position at which the replacement is to occur, and the new character as arguments. After storing the replacement character in the specified position, it must move the subsequent elements down one position. The processing of the replacement of the first prefix in MCMXXIV is illustrated below, with the array cells or variables that change in each step outlined in bold. Storing the new character:

Storing the new character and decrementing the length are easy. Shifting the elements down one position results from tailoring the "process every element of an array" template; the action performed is "copy the next eleement into the current position," and it's applied not to the entire array but only to the elements from the specified position to the next-to-last. Here's the code for Replace:

def Replace(pos, replacement):
global line
global length
k = 0
line[pos] = replacement
for k in range(pos + 1, length):
line[k] = line[k + 1]
length = length - 1

How are the digit counts determined?

We move on to IsLegalCounts. Tallying every Roman digit in the line is an other application of the "process every element of an array" template.

Chapter 5

Durk Jan de Bruin

IsLegalCounts could be organized as ten loops, one to count the occurrences of each Roman digit. There is a better way, however, that uses an array of counts indexed by characters, in the same way as the third version of the Roman Calculator Construction program did. The counting is done by a loop that, for each element of line, adds 1 to the corresponding element of the count array:

for k in range(length):
counts[line[k]] = counts[line[k]] + 1

Here line[k] is used both as an element of an array (line) and as an index into another array (counts).

How is IsLegalCounts coded?

The loop is preceded by initialization of the counts array (only ten elements need be initialized, since line is known to contain only legal Roman digits when IsLegalCounts is called) and followed by testing the counts to make sure they don't exceed the limits. The following code, though somewhat inelegant, is straightforward:

def IsLegalCounts():
global line
global length
k = 0
counts = {
'I' : 0,
'V' : 0,
'X' : 0,
'L' : 0,
'C' : 0,
'M' : 0,
'D' : 0,
'1' : 0,
'2' : 0,
'3' : 0
}
for k in range(length):
counts[line[k]] = counts[line[k]] + 1
LegalCounts = (counts['I'] <= 3) and
(counts['X'] <= 3)
and (counts['C'] <= 3) and (counts['M'] <= 3)
and (counts['V'] <= 1) and (counts['L'] <= 1)
and (counts['D'] <= 1) and (counts['1'] <= 1)
and (counts['2'] <= 1) and (counts['3'] <= 1)
return LegalCounts

Stop & Consider

What makes the code inelegant? Suggest a way to improve it.

How is the order of the digits checked?

Finally, we consider IsLegalOrder. Here's the pseudocode for this function:

for k in range(length):
if line[k + 1] can't legally follow line[k],
then IsLegalOrder = False

The digit line[k + 1] can legally follow one of the original Roman digits in line[k] if line[k+1]'s position in the array of all the Roman digits precedes line[k]'s position in that array. A digit can follow '1', '2', or '3' if its position in the array of all the Roman digits is at least three before the '1', '2', or '3'.

How is the position of a digit in the array of digits located?

A function to search an array of characters for a particular character is required. A straightforward implementation uses the "process every element of an array" (again!):

def Location (ch, line, length):
k = 0
for k in range(length):
if line[k] == ch:
Location = k
return Location

What's wrong with examining every array element?

One problem with applying the "process every element of an array" template is that it may not be desirable to process every element, especially in long arrays. The Location function just coded will continue to check characters in the string, even after it has found what it's looking for. The operation of searching an array appears frequently, and it is likely that we will be able to recycle Location's code in future programs. For long strings, however, we will want a more efficient version, so we choose to invest a bit more time here to produce it.

What code searches only as far in the array as necessary?

The code will include a loop that checks elements of the string argument. The loop should keep going either until the desired value is found or until all elements of the array have been checked. The situation is similar to the blank-skipping process we encountered in the previous section: there were two ways to end the loop, either by finding a nonblank or by encountering the end of the line. Here is that code:

done = False
while not done:
if ch == BLANK:
done = True; # no more line to check
else:
ch = input() # get next character
if ch != BLANK:
done = True # found what we're looking for

And here is how we modify it:

k = 0
done = False
while not done:
if k > length1:
done = True #(no more array elements to check)
elif line1[k] == ch:
done = True #(found what we're looking for)
else:
k = k + 1 #(move to next array element)

In both code segments, the first thing to check is whether there were more elements to compare. Subsequent steps were rearranged because there is no need to "read" the next element of an array.

Chapter 5

Durk Jan de Bruin

Both ways of exiting the search loop should be accounted for before the Location function returns; something will have to be returned if the specified value is not found in the array. This should not happen in this program, since earlier code has ensured that all the elements of the Roman numeral are Roman digits. As a debugging aid, however, we'll add code to print an error message if the character isn't in the string.

How is IsLegalOrder coded?

If LineType is declared as a array of characters. Standard Python allows the use of a string constant as the LineType argument. (Most versions of Python allow somewhat more flexible use of strings.) We take advantage of this shorthand notation and package the code into functions as follows:

def InOrder(romanDigit1, romanDigit2):
ALLDIGITS = 'IV1XL2CD3M'
if (romanDigit1 == '1') or (romanDigit1 == '2')
or (romanDigit1 == '3'):
InOrder = Location(romanDigit1, ALLDIGITS, 10)
>= Location(romanDigit2, ALLDIGITS, 10) + 3
else:
InOrder = Location(romanDigit1, ALLDIGITS, 10)
>= Location(romanDigit2, ALLDIGITS, 10)
return InOrder

def IsLegalOrder():
global line
global length
k = 0
LegalOrder = True
for k in range(length-1):
if not InOrder(line[k], line[k + 1]):
LegalOrder = False
return LegalOrder

Note that IsLegalOrder checks all consecutive pairs of elements in the array, even though it could stop early if there's an error. The difference between this situation and that of searching an array is that (a) we expect to check the whole array if the Roman numeral is legal-inefficiency only appears in the error case-and (b) this is a special-purpose routine intended for handling relatively short Roman numerals, so straightforward coding is more important than efficiency.

The complete set of functions appears in the Python Code section.

How is the code for the rule violation version tested?

All the functions can easily be tested individually and in combination. We include if DEBUGGING then ... code in the more complicated routines to print information about the values they return: the line with prefixes removed by RemovePrefixes, the contents of the count array in IsLegalCounts, the position of out-of-order digits in IsLegalOrder, and the error check that failed in IsRoman. A less experienced programmer may wish to add debugging output to the other functions as well.

The search and replace functions can be tested with data that deal with the typical situation, involving something in the middle of the array, and the extreme situations, involving the ends of the array. IsLegalCounts should be tested with Roman numerals that contain all the digits, with the maximum number and one greater than the maximum number of occurrences.

Why are test results not so convincing for this code?

Once we've tested the individual functions, what then? In this problem, knowing that the various components of the program work is not as encouraging as it was in the other case studies. We have certainly eliminated a collection of illegal Roman numerals-we've caught those with illegal digits, those with too many of a particular digit, and those in which consecutive pairs of digits are out of order-but there still remains the possibility that we forgot to check some subtle illegality, perhaps involving digits that are two or more positions apart in the Roman numeral.

The application of a spelling check is analogous; with a complex system of rules such as those for English spelling, it is not always possible to list every special case. Our situation is represented in diagram form (similar to an illustration earlier in this case study) as follows:

The flaw inherent in the rule-violation solution to the problem is that the resulting program is only as good as the description, which may itself be incomplete. We see that straightforward code alone is not sufficient evidence of correct code. The alternative approach to checking a Roman numeral, to be discussed next, will shed some light on how to generate a good set of test data.

Reflection

5.20 What test results would be sufficient to convince you that the function detected all illegal Roman numerals?

Reflection

5.21 What test results would be sufficient to convince you that the function correctly identified a legal Roman numeral as legal?

Debugging

5.22 Insert a bug in the code that causes the function to identify an illegal Roman numeral as legal, and get a fellow programmer to find the bug. Do the same with a bug that causes the function to identify a legal Roman numeral as illegal. Which bug is easier to find?

Chapter 5

Durk Jan de Bruin

Another Way to Modify Roman Calculator Construction

What are the benefits of the rule-confirmation approach?

The rule-confirmation approach to checking a Roman numeral is potentially more encouraging. A program based on the rules for constructing a Roman numeral could reveal either the rule that was violated by an illegal numeral or the steps by which the Roman numeral was legally constructed. The latter is much more convincing evidence that no subtle error conditions were overlooked. We turn now to this approach to the design of IsRoman, focusing on the rules for building Roman numerals.

What are the rules for constructing Roman numerals?

The digits of a Roman numeral can be grouped into sequences of the same digit, prefix combinations, and single digits. For example, suppose that a Roman numeral representing a value of 3999 or less starts with some number of M's. Then the M's can be grouped, and what follows them has to represent a value of 999 or less. Continuing the process, a value of 999 or less might start with the prefix combination CM. Those two Roman digits can be grouped, and what follows them must represent a value of 99 or less. The diagram below shows the digit grouping for the numeral MMMCMLXIV:

How can the rules help with the design?

One way to use these rules as a basis of a program is to design a collection of boolean functions that, when given a line and a position in the line, return true exactly when the contents of the line starting at the given position represent a certain kind of Roman numeral. One might, for instance, have thirteen functions, one for each rule, named IsLegalAtMost3999,IsLegalAtMost999, IsLegalAtMost899, and so on.

How do the rules suggest a state-based design?

Another way is to organize the program as a loop that moves from state to state as it advances through the line, depending on what characters the line contains. A state is a situation. In this program, a state might represent the situation "I'm looking for a Roman numeral representing a value at most 89." In a state-based program, the contents of a variable indicates what state the program is in. The loop body is essentially a case statement, one case for each state, in which the next character in the line is examined and the state updated. In pseudocode, using variables named state and linePos (the position in line), it looks like this:

state = the start state
linePos = 1
while state isn't the "done" state or the "error" state, do the following:
"looking for a value at most 3999":
examine line[linePos] and set state to a new state value
"looking for a value at most 999":
examine line[linePos] and set state to a new state value
"looking for a value at most 3":
examine line[linePos] and set state to a new state value

One may view a state-based program as similar to a house. Rooms in the house are analogous to states in the program; wandering from room to room is like changing the state variable from one value to another.

Neither one of these approaches looks too encouraging at first glance; the idea of a large collection of boolean functions or a large cose statement seems intimidating, involving too much "brute force" and not enough "elegance." Sometimes, however, it is possible to exploit commonalities in code through clever use of functions and arrays. Table 5.2 suggests that several of the state processing segments will resemble one another. Thus, we will attempt to use a state-based framework as a basis for Roman numeral checking.

Chapter 5

Durk Jan de Bruin

How can a collection of states and state transitions be pictured?

When working with a state-based program, it helps to create a state diagram to keep track of all the possible situations and paths between them. The state diagram represents the different states by boxes and the ways to get from state to state-the transitions-by labeled arrows. Analogously, players of computer adventure games create maps of the various "rooms" of the adventure and the ways to get from room to room. The state diagram for the Roman numeral checker appears in Figure 5.3

How is a value of 3999 or less checked?

Back to the program design. To start, we pick a state-the first one, "looking for a value at most 3999," seems as good as any-and see how the associated code might look. According to the table, checking if a Roman numeral represents a value of 3999 or less involves first isolating the leading M's, then analyzing the remaining Roman digits via one of the other rules. In pseudocode, we have

Count the leading M's
If number of M's > 3, then state = "error detected",
since the Roman numeral isn't legal.
If the Roman numeral is all M's, then return true.
Otherwise, return the result of checking whether
the rest of the Roman numeral represents a value at most 999.

How is the number of leading M's found?

Determining the number of leading M's is the same as finding the position of the first non-M. We've already written code that does almost exactly that: the Locotion function in the previous section. Here's the code, modified for the purposes of the current application:

k = linePos
nonMfound = False
while not nonMfound:
if k > length:
nonMfound = True
elif line[k] != 'M':
nonMfound = True
else:
k = k + 1

How is the result of the M-skipping loop used?

Once the loop terminates, the variable k contains the position of the first non-M, so the expression k - linePos represents the number of leading M's. This should be at most 3. If the Roman numeral is all M's, it is legal; otherwise, the state "looking for a value at most 999" is entered to check the Roman numeral that starts at position k. The following code implements these steps.

if k - linePos > 3:
state = "the error state"
else if k > length:
state = "the success state"
else:
linePos = k
state = "looking for a value at most 999"

Stop & Help

What other states will be handled in a similar way?

How should states be represented in Python?

It makes sense here to pick a representation for states. One choice is to have a state be an integer value: "looking for a value at most 3999" would be 1, "looking for a value at most 999" 2, and so on. It would be difficult for someone reading the code to keep track of how states and integers were associated, however. A better idea is to refer to states somehow by names.

Recall that in The Calendar Shop Python's const facility allowed months to be referred to by name instead of by number. We can do the same thing here, defining constants ERROR, SUCCESS, ATMOST3999, ATMOST999, and so on down to ATMOST3. The main loop body is then

Chapter 5

Durk Jan de Bruin

state = ATMOST3999
linePos = 1
while (state != SUCCESS) and (state ! =ERROR):
if state == ATMOST3999:
examine line[linePos] and set state to a new state value
if state == ATMOST999:
examine line[linePos] and set state to a new state value
if state == ATM0ST3:
examine line[linePos] and set state to a new state value

and the code for handling state ATMOST3999 will be

k = linePos
nonMfound = False
while not nonMfound:
if k > length:
nonMfound = True
elif line[k] != 'M':
nonMfound = True
else:
k = k + 1
if k - linePos > 3:
state = ERROR
elif k > length:
state = SUCCESS
else:
linePos = k
state = ATMOST999

How is a value of 999 or less checked?

We move on to design eode to handle the next state, ATMOST999. Table 5.2 indicates that either such a Roman numeral starts with CM, or it represents a value of 899 or less.

The code is easy. First it checks that there are at least two Roman digits left in the line; if not, then the numeral doesn't represent anything in the 900's, but it may represent a smaller value. If the first is C and the next is M, the rest must represent a value at most 99. Otherwise, we look for a value at most 899.

if linePos + 1 > length:
state = ATMOST899
elif (line[linePos] == 'C') and (line[linePos + 1] == 'M'):
state = ATMOST99
linePos = linePos + 2
else:
state = ATMOST899

Stop & Help

What would happen if the check for two remaining digits was omitted?

How is a value of 899 or less checked?

Next, we handle ATMOST899. The code should check for a leading D after making sure there are Roman digits remaining in the line. (No more remaining means success.) The digits following a D must represent a value of 399 or less; if there is no leading D, the existing digits must represent a value of 499 or less. The code is similar to that for ATMOST999:

if linePos > length:
state = SUCCESS
elif (line[linePos] == 'D'):
state = ATMOST399
linePos = linePos + 1
else:
state ATMOST499

How are the other states coded?

Continuing with the remaining functions, we find that the code to handle ATMOST499, ATMOST49, and ATMOST4 are all very similar. We also find close similarities among ATMOST3999, ATMOST399, ATMOST39, and ATMOST3, among ATMOST999, ATMOST99, and ATMOST9, and among ATMOST899, ATMOST89, and ATMOST8.

The only tricky part comes in ATMOST9 and ATMOST4, when the sequences IX and IV must be the last characters on the line. We handle this by adding a state called ATMOST0. Code to handle this state consists merely of checking that linePos is equal to length + 1; if so, state = SUCCESS, otherwise state = ERROR.

Stop & Help

Write the code for ATMOST3.

How can the code be simplified?

We now have a lot of code. We'd like to have less. First, we examine the similarities among the code for ATMOST3999, ATMOST399, ATMOST39, and ATMOST3. The search for a non-M or non-C or whatever can be packaged in a function, in the same way the Locotion code was packaged in the previous section. We'll call the function PositionOfNon; it takes as arguments the line, its length, the current position, and the character to search for something other than. After inserting a call to that function, we have the following pattern:

k = PositionOfNon(line, length, linePos, 1):
if k - linePos > 3:
state = ERROR
elif k > length:
state = SUCCESS
else:
linePos = k
state = 2

1 stands for a Roman digit; 2 stands for a state name. The table below indicates what should be substituted for 1 and 2 in each case.

Chapter 5

Durk Jan de Bruin

Current State	To be substituted for 1	To be substituted for 2
ATMOST3999	'M'	ATMOST999
ATMOST399	'C'	ATMOST99
ATMOST39	'X'	ATMOST9
ATMOST3	'I'	ATMOST0

How can the code pattern be represented in a function?

Thus we can make a function called Update3x, called whenever we are looking for a value at most 3999, 399, 39, or 3:

def Update3x(maxDigits, line, length, linePos):
global state
k = 0
k = PositionOfNon(line, length, linePos,
TENDIGITS[maxDigits])
if k - linePos > 3:
state = ERROR
elif k > length:
state = SUCCESS
else:
linePos = k
state = STATE9
maxDigits = maxDigits - 1

Similar functions called Update9x, Update8x, and Update4x can be written to unify the code for states ATMOST999, ATMOST99, and ATMOST9, for ATMOST899, ATMOST89, and ATMOST8, and for ATMOST499, ATMOST49, and ATMOST4. The main loop is then coded as follows:

state = ATMOST3999
linePos = 1
while (state != ERROR) and (state != SUCCESS):
if state == ATMOST3999:
Update3x(maxDigits, line, length, linePos, 'M', state, ATMOST999)
if state == ATMOST999:
Update9x(maxDigits, line, length, linePos, 'C', 'M', state, ATMOST99, ATMOST899)
if state == ATMOST899:
Update8x(maxDigits, line, length, linePos ,'D', state, ATMOST399, ATMOST499)
if state == ATMOST499:
Update4x(maxDigits, line, length, linePos 'C', 'D', state, ATMOST99, ATMOST399)
if state == ATMOST399:
Update3x(maxDigits, line, length, linePos, 'C', state, ATMOST99)
if state == ATMOST99:
Update9x(maxDigits, line, length, linePos, 'X', 'C', state, ATMOST9, ATMOST89)
if state == ATMOST89:
Update8x(maxDigits, line, length, linePos ,'L', state, ATMOST39, ATMOST49)
if state == ATMOST49:
Update4x(maxDigits, line, length, linePos 'X', 'L', state, ATMOST9, ATMOST39)
if state == ATMOST39:
Update3x(maxDigits, line, length, linePos, 'X', state, ATMOST9)
if state == ATMOST9:
Update9x(maxDigits, line, length, linePos, 'I', 'X', state, ATMOST0, ATMOST8)
if state == ATMOST9:
Update8x(maxDigits, line, length, linePos ,'V', state, ATMOST3, ATMOST4)
if state == ATMOST4:
Update4x(maxDigits, line, length, linePos 'I', 'V', state, ATMOST0, ATMOST3)
if state == ATMOST3:
Update3x(maxDigits, line, length, linePos, 'I', state, ATMOST0)

How can function parameters be eliminated?

Next we notice that the "at most" value corresponding to a given state can be computed from the value for the previous state, and that furthermore the Roman digit(s) to be checked for can be computed from that value. Table 5.4 shows the possibilities for going from one "at most" value to the next.

Table 5.4 suggests that a state should be represented not by a word, but by the maximum value to be found in that state. Its digits can be counted and separated using techniques similar to those described in Check That Number! We still need a state variable, but now it will have only three values: INPROCESS, SUCCESS, and ERROR.

Chapter 5

Durk Jan de Bruin

How can an array be used to associate the "at most" value with a Roman digit?

Furthermore, two arrays of Roman digits can be used to look up the digits to search for. One array, which we'll call TENDIGITS, will store powers of 10: 'I', 'X', 'C, and 'M'. The other, which we'll call FIVEDIGITS, will store digits representing 5 times the corresponding power of 10: 'V, 'L', and 'D'. The code for Update3x is then

def Update3x(maxValue, line, length, linePos):
global state
k = 0
k = PositionOfNon(line, length, linePos,
TENDIGITS[maxValue])
if k - linePos > 3:
state = ERROR
elif k > length:
state = SUCCESS
else:
linePos = k
maxValue = AllButFirstDigit(maxValue)

In fact, we can avoid the need for functions to count and remove digits by splitting maxValue into two components: its first digit and the number of digits it has. We incorporate the first digit back into the state variable to get the final set of functions in the Python Code section.

How is the code tested?

We now proceed to test the program. LocotionOfNon can of course be tested separately. The various state-handling routines are probably best tested together, since they are so closely connected.

How can the state diagram help generate test data?

To test the state-based code, we return to the black-box approach. Testdata that exercise every statement in the code will be data that test all transitions, or movements, from state to state. Thus tests for all possible transitions from state to state must be invented. As we test Roman numerals, we can check off the transitions they exercise on the state diagram to make sure we've covered them all.

This set of data is also useful for testing the Roman numeral checking function designed using the rule-violation approach. Since these data essentially cover all the ways to build a legal Roman numeral, they may reveal checks that were overlooked in designing that program.

Stop & Help

Generate a set of test data, and test both versions IsRoman with it.

As in previous programs, we install if DEBUGGING then ... code to provide debugging output. Here, the interesting information is the sequence of states that the program runs through to check the numeral, and the pieces of the Roman numeral being checked at each state.

Analysis

5.23 Trace the execution of IsRoman on XLIX, XIXI, and XAXII.

Analysis

5.24 Would adding a check that the string consists solely of Roman digits allow the code for the rest of IsRoman to be simplified? Explain.

Debugging

5.25 Insert a bug in the code that causes the rule-confirmation version of IsRoman to identify an illegal Roman numeral as legal, and get a fellow programmer to find the bug. Do the same with a bug that causes the function to identify a legal Roman numeral as illegal. Which bug is easier to find?

Modification

5.26 Modify the rule-confirmation version of IsRoman to enter an error state immediately on encountering a character that's not a Roman digit.

Analysis

5.27 In what other situations is it possible to detect an error before processing the entire string?

Analysis

5.28 PositionOfNon is perhaps not the best name for a function that finds the position of an element not equal to a given value. Invent a better name for this function.

Outline of Design and Development Questions

These questions summarize the main points of the commentary.

Checking for Legal Input
What changes are needed to guard against illegal data?
How can the code from The Calendar Shop be recycled?
What template is represented in this decomposition?
What type of variable should be used for input?
How much should be read at a time?
What definitions and declarations are used?
How is a line read?
What if the line is longer than the array?
What is the right length for LINELENGTH?
How does detecting legal values fit into a function?
What happens if the input is illegal?

Chapter 5

Durk Jan de Bruin

Modifying the Check That Number! Program
What are illegal inputs to Check That Number?
What template is used for checking the digits?
What are good test cases?

Modifying Banners With CLASS and The Calendar Shop
What are illegal inputs to Banners With CLASS and The Calendar Shop"?
How can the letters in Banners With CLASS be checked?
How can bounded integers be checked?
What are alternatives for decomposing the input and testing of an integer?
Which decomposition should be chosen?
How are the nonblank characters in a line read?
How can a while loop be used to test for blanks and end-of-line?
How is the rest of the line read into the line array?
What must be done to the array-filling code to make it usable?
How is line checked to consist only of digits?
How are the digits in line converted to an integer?
How are digit values combined to get an integer?
How are the various code segments combined?
How is the code summarized?
How should the code be tested and debugged?
What code should be tested first?
What test values should be used?
How should the remaining code be tested?

One Way to Modify Roman Calculator Construction
What should be the first step of the modified ReadRoman?
What are illegal Roman numerals?
What are the alternatives for testing the legality of a Roman numeral?
How is the rule-violation version of IsRoman designed?
How is IsLegalRomanDigits coded?
How can the prefix tests be simplified?
Will it work to substitute single digits for prefix combinations?
How do the new digits behave compared with the original Roman digits?
How should the error-checking functions be separated?
How can an array be used to see if two digits are correctly ordered?
How is ReplacePrefixes coded?
What template is applicable to the problem of replacing prefixes?
How must the code suggested by the template be modified?
How are prefixes detected?
How are prefixes replaced?
How are the digit counts determined?
How is IsLegalCounts coded?
How is the order of the digits checked?
How is the position of a digit in the array of digits located?
What's wrong with examining every array element?
What code searches only as far in the array as necessary?
How is IsLegalOrder coded?
How is the code for the rule-violation version tested?
Why are test results not so convincing for this code?

Another Way to Modify Roman Calculator Construction
What are the benefits of the rule-confirmation approach?
What are the rules for constructing Roman numerals?
How can the rules help with the design?
How do the rules suggest a state-based design?
How can a collection of states and state transitions be pictured?
How is a value of 3999 or less checked?
How is the number of leading M's found?
How is the result of the M-skipping loop used?
How should states be represented in Python?
How is a value of 999 or less checked?
How is a value of 899 or less checked?
How are the other states coded?
How can the code be simplified?
How can the code pattern be represented in a function?
How can function parameters be eliminated?
How can an array be used to associate the "at most" value with a Roman digit?
How is the code tested?
How can the state diagram help generate test data?

Programmers' Summary

This entire case study is an application of the Persecution Complex principle, a realization that nasty or confused users are liable to type just about anything as input to a program, and that a well-written program is able to handle such illegal input gracefully. Error-checking code is added to the programs for each of the earlier case studies, to ensure that a numeric value is provided in Check That Number!, Banners With CLASS, and The Calendar Shop, a correct set of letters is provided in Banners With CLASS, and a correct Roman numeral is provided in Roman Calculator Construction.

Error checking in all cases begins with reading a line of input into an array of characters. Two straightforward tests are then made, ensuring that the length of the input is appropriate and that it consists solely of relevant characters.

Chapter 5

Durk Jan de Bruin

For integer input, there is a complication and an extra step: to be consistent with Python's input functions, our program has to ignore blanks before the integer. After making sure that the nonblank characters are all digits, it then has to convert the characters to an integer value and check that the value is within specified bounds.

Detecting errors in a Roman numeral is more complicated, due to the as sortment of rules defining the structure of a Roman numeral. Two alternatives for error checking are explored: a rule-violation approach, in which the code checks successively for a number of different characteristics of an illegal numeral; and a rule-confirmation approach, in which the code attempts to construct a legal Roman numeral with the same digits as the input and signals error if it can't. The rule-violation approach decomposes the error-checking process into a number of tests that apply common array templates, and it employs the technique of transforming the input into something easier to handle. The result is a program that is relatively easy to understand and develops incrementally. However, the approach has a flaw:one really can not be sure that all characteristics of an illegal Roman numeral have been checked by the code.

The rule-confirmation approach is based on a set of thirteen rules for constructing Roman numerals. From these rules, a state-based program is designed that moves from state to state as it recognizes Roman numerals in successive segments of the input line. The design starts with code segments for each rule; by noticing similarities in the code, we are able to design four state-processing functions that can be repeatedly called to recognize the numeral. The state diagram, an example of an alternate representation, is used to devise a comprehensive set of test data that display all the ways a Roman numeral can be built.

Testing of the code mostly uses the black-box approach. Extreme cases are determined by finding numeric quantities associated with the input, such as its length and the number and position of interesting characters (digits, nondigits, leading blanks, or nonblanks) it contains. Test data are chosen that maximize or minimize legal and illegal values of these quantities. In the rule-violation version of the Roman numeral checker, test data are derived from the state diagram, chosen to exercise all transitions from state to state. In all the programs, the Persecution Complex principle is applied by adding debugging output that displays the values of relevant variables.

Applying the Recycling principle, the case study displays many examples of code reuse and template application. The main error-checking loop is copied from The Calendar Shop. Conversion of characters to numeric values is copied from Check That Number! The "process every element of an array" template, introduced in Roman Calculator Construction, is applied in several ways: checking that the input characters are all digits or all letters drawn from 'P', 'Y', 'T', 'H', 'O' and 'N'; accumulating the numeric value of a string of digits; shifting array elements down one position; tallying each array element in a counts array; and checking that each consecutive pair of Roman digits is properly ordered. A second array template, "searching an array," is derived from the code that reads a line into an array and then is applied to find the location of a particular Roman digit and to skip through a sequence of M's, C's, X's, or I's. The line-reading code is also recycled to produce the code that skips through blanks in the input and the code used to construct and check an integer value. Finally, the "select from alternatives" template was recycled to produce code that handles the various Roman digits in different ways.

On several occasions a choice is made between code that perhaps needlessly checks every element of an array and more complicated code that terminates when an array element with specified properties is found. Two criteria guide the choice: Is efficient code really necessary? Is the extra effort in designing the more complicated code likely to pay off in future applications?

Arrays are used in interesting ways. One use is to represent precedence constraints on Roman numerals: two Roman digits are correctly ordered if one precedes the other in a special array of Roman digits. Another pair of arrays is used to compute, from the number of digits in the argument to one of the state-handling functions, the Roman digits to look for in that function.

Design of the integer input function involves a choice from three decompositions. Template application suggests one choice; the desire to be consistent with Python's handling of integer input leads to another. The two-part input routine, which first skips blanks and then fills the line array with nonblank characters, requires careful attention to matching up the postcondition of the blank-skipping loop with the precondition of the arrayfilling loop.

The Literacy principle appears throughout in our choice of informative variable and function names. It is applied in the use of the const facility to give names to states in the rule-confirmation Roman numeral checker. This principle is also applied in the rule-violation Roman numeral checker and in code derived from the "search an array" template, by using positive rather than negative boolean values to make the code more understandable.

Making Sense of Is It Legal?

Analysis

5.29 What categories of test cases are necessary to determine whether checks for illegal data are successful?

Chapter 5

Durk Jan de Bruin

Reflection

5.30 Why are there so many opportunities to recycle other solutions in this case study?

Analysis

5.31 What categories of illegal data are detected by the functions in this case study?

Application

5.32 Provide a real-world situation in which errors are sometimes checked with the rule-violation approach and sometimes with the rule-confirmation approach.

Application

5.33 Which method makes more sense for checking that a Python program is syntactically correct, the rule-violation method or the rule-confirmation method?

Reflection

5.34 Which version of IsRoman do you find easier to understand? Which version of IsRoman do you find easier to believe is correct?

Modification

5.35 Modify each version of IsRoman to handle two more Roman digits, T, meaning 10000, and F, meaning 5000. M can prefix T to represent 9000 or F to represent 4000. In other ways, T is similar to M, C, X, and I, and F is similar to D, L, and V.

Modification

5.36 The problem statement specifies that errors be detected in input to the programs of the earlier case studies. However, it does not ask that the errors be diagnosed. Quality software will not only inform the user that an error has occurred, but it will also try to explain the circumstances of the error. Identify opportunities in which more details could be provided about illegal input in each of the earlier programs, and modify the programs to produce diagnostic output.

Application

5.37 Write a program called GuessMyNumber that asks the user to guess a number between two randomly generated values. Check for illegal (for example, out of bounds) input, and print informative messages.

Revised Check That Number! Program

LINELENGTH = 4
LineType = ''
line = ''
length = 0 # length of the input line

""" Read and return a line of up to LINELENGTH characters. The length of the line will be returned in length; if more than LINELENGTH characters are provided on the line, excess characters are discarded. """

def ReadLine():
global length
global line
ch = ''
ch = input()
length = len(ch)
if length <= LINELENGTH:
line = ch

""" Return true exactly when c is a digit. """
def IsDigit(c):
IsDigit = c.isnumeric()
return IsDigit

""" Return true exactly when line represents a legal identification number."""

def IsLegal():
global line
global length
k = 0
IsLegal = True
if length != 4:
IsLegal = False
else:
for k in range(4):
if not IsDigit(line[k]):
IsLegal = false
return IsLegal

""" Validate the id number in line by checking that its check digit is correct. """

def Validate():
global line
global length
digit1 = digit2 = digit3 = checkDigit = 0
digit1 = int(line[0]) - int('0')
digit2 = int(line[1]) - int('0')
digit3 = int(line[2]) - int('0')
checkDigit = int(line[3]) - int('0')
if (digit1 + digit2 + digit3) % 7 == checkDigit:
print('The number is valid.')
else:
print('*** The number is invalid.')

# Main Program

Legal = False
while not Legal:
print('Please type the four-digit number to be validated: ', end = '')
ReadLine()
if not IsLegal():
print('You must type a four-digit identification number.')
else:
Legal = True
Validate()

Chapter 5

Durk Jan de Bruin

Revised Banners With CLASS Program

NUMLETTERS = 6
LINELENGTH = 7
BLANK = ' '
LineType = ''
letterSize = 0
letterNum = 0
length = 0
line = ''

""" Read a line after skipping leading blanks. Nonblank characters beyond the LINELENGTH'th nonblank read
after the initial sequence of blanks are discarded. """

def ReadTrimmedLine():
global length
global line
ch = ''
ch = input()
length = len(ch)
if length <= LINELENGTH:
line = ch

""" Return true exactly when line contains only digits. """

def IsAllDigits(c):
global line
IsAllDigits = True
IsAllDigits = line.isnumeric()
return IsAllDigits

""" Return the integer value of the contents of line, which must consist of all digits. """

def IntegerValue():
value = int(line)
return value

""" Return true exactly when n is in the range 5,..., 24 """

def IsInRange(n):
IsInRange = (n >= 5) and (n <= 24)
return IsInRange

""" Read and return an integer between 5 and 24; keep prompting until the user supplies one. """

def ReadInteger(n):
error = False
done = False
global line
global length
while not done:
error = False
print('Type a size for the letters between 5 and 24 : ', end = '')
ReadTrimmedLine()
if (length == 0) or (length > LINELENGTH):
error = True
elif not IsAllDigits(line):
error = True
else:
n = IntegerValue()
if not IsInRange(n):
error = True
else:
done = True
if error:
print('You must type an integer between 5 and 24.')
return n

""" Read and return a line of up to LINELENGTH characters. The length of the line will be returned in length;
if more than LINELENGTH characters are provided on the line, excess characters are discarded """

def ReadLine():
global length
global line
ch = ''
ch = input()
length = len(ch)
if length <= LINELENGTH:
line = ch

""" Return true exactly when c is a digit. """

def IsDigit(c):
IsDigit = c.isnumeric()
return IsDigit

""" Return true exactly when c is one of PYTHON. """

def IsInPYTHON(c):
IsInPYTHON = (c == 'P') or (c == 'p')
or (c == 'Y') or (c == 'y')
or (c == 'T') or (c == 't')
or (c == 'H') or (c == 'h')
or (c == 'O') or (c == 'o')
or (c == 'N') or (c == 'n')
return IsInPYTHON

""" Return true exactly when line represents a legal identification number. """

def IsLegal():
k = 0
IsLegal = True
if length != NUMLETTERS:
IsLegal = False
else:
for k in range(NUMLETTERS):
if not IsInPYTHON(line[k]):
IsLegal = False
return IsLegal

""" Read a line consisting of the right number of PYTHON. Keep prompting until the user provides such a line. """

def ReadLetters():
while not IsLegal():
print('Type {} letters, chosen from PYTHON: '. format(NUMLETTERS))
ReadLine()
if not IsLegal():
print('Either you haven\'t typed {} characters, or one of them isn\'t PYTHON.'. format(NUMLETTERS))

""" letter drawing procedures go here """

Chapter 5

Durk Jan de Bruin

# Main Function

letterSize = ReadInteger(letterSize)
ReadLetters()
for letterNum in range(NUMLETTERS):
if line[letterNum] == 'P' or line[letterNum] == 'p':
DrawP(letterSize)
if line[letterNum] == 'Y' or line[letterNum] == 'y':
DrawY(letterSize)
if line[letterNum] == 'T' or line[letterNum] == 't':
DrawT(letterSize)
if line[letterNum] == 'H' or line[letterNum] == 'h':
DrawH(letterSize)
if line[letterNum] == 'O' or line[letterNum] == 'o':
DrawO(letterSize)
if line[letterNum] == 'N' or line[letterNum] == 'n':
DrawN(letterSize)

Code for the Rule-Violation Approach

line = ''
length = 0
DEBUGGING = False

""" Return true exactly when all the digits in the line are legal Roman digits."""

def AllLegal RomanDigits():
global line
global length
k = 0
AllLegal RomanDigits = True
for k in range(length):
if not (line[k] in ['I', 'V', 'X', 'L', 'C', 'D', 'M'] ):
AllLegal RomanDigits = False
if length == 0:
AllLegalRomanDigits = False
return AllLegal RomanDigits

""" Replace the *two characters* in the line starting at the given position by the *single* replacement character. The length of the line is reduced by 1 as a result.
Precondition: pos < length """

def Replace(pos, replacement):
global line
global length
k = 0
line[pos] = replacement
for k in range(pos + 1, length):
line[k] = line[k + 1]
length = length - 1

""" Replace each pair of Roman digits that starts with a prefix by a single character. (IV and IX are replaced by 1; XL and XC are replaced by 2; CD and CM are replaced by 3). """

def ReplacePrefixes():
global line
global length
if line.find("IV") != -1:
line = line.replace("IV", "1")
length = length - 1
if line.find("IX") != -1:
line = line.replace("IX", "1")
length = length - 1
if line.find("XL") != -1:
line = line.replace("XL", "2")
length = length - 1
if line.find("XC") != -1:
line = line.replace("XC", "2")
length = length - 1
if line.find("CD") != -1:
line = line.replace("CD", "3")
length = length - 1
if line.find("CM") != -1:
line = line.replace("CM", "3")
length = length - 1
if DEBUGGING:
print('*** After prefix substitution: ', end = '')
print(line, length)

""" Return true exctly when there are at most 3 occurrences of I, X, C, and M, and at most 1 occurrence of V, L, D, 1, 2, and 3 in ven line. (1 represents the pair IV or IX; 2 represents the pair XL or XC; 3 represents the pair CD or CM.)
Precondition: line consists only of Roman digits, 1, 2, and 3. """

def LegalCounts():
global line
global length
k = 0
counts = {
'I' : 0,
'V' : 0,
'X' : 0,
'L' : 0,
'C' : 0,
'M' : 0,
'D' : 0,
'1' : 0,
'2' : 0,
'3' : 0
}
for k in range(length):
counts[line[k]] = counts[line[k]] + 1
LegalCounts = (counts['I'] <= 3) and (counts['X'] <= 3)
and (counts['C'] <= 3) and (counts['M'] <= 3)
and (counts['V'] <= 1) and (counts['L'] <= 1)
and (counts['D'] <= 1) and (counts['1'] <= 1)
and (counts['2'] <= 1) and (counts['3'] <= 1)
return LegalCounts

""" Return the position in the line of the given character. Precondition: ch occurs somewhere in line. """

def Location(ch, line1, length1):
k = 0
done = False
while not done:
if k > length1:
done = True #(no more array elements to check)
Location = 0
if DEBUGGING:
print('*** character {} not found in line!'.format(ch))
elif line1[k] == ch:
done = True #(found what we're looking for)
Location = k
else:
k = k + 1 #(move to next array element)
return Location

Chapter 5

Durk Jan de Bruin

""" The two Roman digits occur consecutively in a Roman numeral being checked for legality, whose prefixes (along with the digits they prefix) have been replaced by the digits 1, 2, and 3. Return true exactly when the two Roman digits are in the correct order. (M, if it occurs, must precede all other digits; 3 (meaning CM or CD), if it occurs, must come next; and so on. """

def InOrder(romanDigit1, romanDigit2):
ALLDIGITS = 'IV1XL2CD3M'
if (romanDigit1 == '1') or (romanDigit1 == '2') or (romanDigit1 == '3'):
InOrder = Location(romanDigit1, ALLDIGITS, 10) >= Location(romanDigit2, ALLDIGITS, 10) + 3
else:
InOrder = Location(romanDigit1, ALLDIGITS, 10) >= Location(romanDigit2, ALLDIGITS, 10)
return InOrder

""" Return true if consecutive Roman digits in line are ordered correctly. """

def LegalOrder():
global line
global length
k = 0
LegalOrder = True
for k in range(length-1):
if not InOrder(line[k], line[k + 1]):
LegalOrder = False
return LegalOrder

""" Return true exactly when line contains a legal Roman numeral. """

def IsRoman():
IsRoman = False
if not AllLegalRomanDigits():
if DEBUGGING:
print('*** failed digit check')
IsRoman = False
else:
ReplacePrefixes()
if not LegalCounts():
IsRoman = False
if DEBUGGING:
print('*** failed count check')
elif not LegalOrder():
IsRoman = False
if DEBUGGING:
print('*** failed order check')
else:
IsRoman = True
return IsRoman

def ReadLine():
global roman
global length
global rlength
roman = input()
global line
line = roman
length = len(roman)
rlength = length

""" Read a Roman numeral from the user. Keep prompting until the user supplies a legal Roman numeral. """

def ReadRoman():
flag = False
while not flag:
print('Please type a Roman numeral: ', end = '')
ReadLine()
if not IsRoman():
print('*** What you typed isn\'t a valid Roman numeral.')
else:
flag = True

DECIMALEQUIV = {
'M' : 1000,
'D' : 500,
'C' : 100,
'L' : 50,
'X' : 10,
'V' : 5,
'I' : 1
}

# Given a Roman numeral and the number of Roman digits it contains, return its value in decimal.

def DecimalValue(roman, length):
sum = 0
k = 0
for k in range(0, length-1):
if DECIMALEQUIV[roman[k]] >= DECIMALEQUIV[roman[k+1]]:
sum = sum + DECIMALEQUIV[roman[k]]
else:
sum = sum - DECIMALEQUIV[roman[k]]
DecimalValue = sum + DECIMALEQUIV[roman[length-1]]
return DecimalValue

def PrintDecimal(n):
# Print Decimal
print(n)

# Main Function

while True:
ReadRoman()
PrintDecimal (DecimalValue (roman, rlength))

Chapter 5

Durk Jan de Bruin

Code for the Rule-Confirmation Approach

MAXSTRLEN = 20
DEBUGGING = False
state = 0
STATE3 = 1
STATE9 = 2
STATE4 = 3
STATE8 = 4
ERROR = 5
SUCCESS = 6

DigitTable = ''
LineType = ''
StateType = 0

TENDIGITS = 'IXCM'
FIVEDIGITS = 'VLD'
roman = ''
length = 0

""" Return the position of the first character in line starting at position pos that is not the same as ch. If all characters in line starting at position pos are the same as ch, return length + 1. """

def PositionOfNon (line, length, pos, ch):
done = False
while not done:
if pos > length:
done = True
elif line[pos] != ch:
done = True
else:
pos = pos + 1
PositionOfNon = pos
return PositionOfNon

""" Check for a Roman numeral representing at most 3999, 399, 39, or 3; state should have the value STATE3. maxDigits will be 4, 3, 2, or 1 depending on which of the Roman numerals is being checked for. Proceed by looking for a sequence of M's, C's, X's, or I's, depending on maxDigits; if there are four or more in the sequence, there is an error, othervise the state and the position within the line are updated. The next thing to look for will be either 999, 99, 99, or the end of the numeral. """

def Update3x (maxDigits, line, length, linePos):
global state
k = 0
k = PositionOfNon(line, length, linePos, TENDIGITS[maxDigits])
if k - linePos > 3:
state = ERROR
elif k > length:
state = SUCCESS
else:
linePos = k
state = STATE9
maxDigits = maxDigits - 1

""" Check for a Roman numeral representing at most 999, 99, or 9; state should have the value STATE9. maxDigits will be 3, 2, or 1 depending on which of the Roman numerals is being checked for. Proceed by looking for one of CM, XC, or IX, depending on maxDigits. If it is found, the next thing to look for will be 99, 9, or the end of the numeral; if not, the next thing to look for will be 899, 89, or 8. """

def Update9x (maxDigits, line, length, linePos):
global state
if linePos + 1 > length:
state = STATE8
elif (line[linePos] == TENDIGITS[maxDigits]) and (line[linePos + 1] == TENDIGITS[maxDigits + 1]):
linePos = linePos + 2
maxDigits = maxDigits - 1
else:
state = STATE8

""" Check for a Roman numeral representing at most 499, 49, or 4; state should have the value STATE4. maxDigits will be 3, 2, or 1 depending on which of the Roman numerals is being checked for. Proceed by looking for one of CD, XL, or IV, depending on maxDigits. If it is found, the next thing to look for will be 99, 9, or the end of the numeral; if not, the next thing to look for will be 399, 39, or 3 . """

def Update4x (maxDigits, line, length, linePos):
global state
if linePos + 1 > length:
state = STATE3
elif (line[linePos] == TENDIGITS[maxDigits]) and (line[linePos + 1] == FIVEDIGITS[maxDigits]):
linePos = linePos + 2
state = STATE9
maxDigits = maxDigits - 1
else:
state = STATE3

""" Check for a Roman numeral representing at most 899, 89, or 8; state should have the value STATES. maxDigits will be 3, 2, or 1 depending on which of the values is being checked for. Proceed by looking for D, L, or V, depending on maxDigits. If it is found, the next thing to look for will be 399, 39, or 3; if not, the next thing to look for will be 499, 49, or 4."""

def Update8x (maxDigits, line, length, linePos):
global state
if linePos > length:
state = SUCCESS
elif line[linePos] == FIVEDIGITS[maxDigits]:
linePos = linePos + 1
state = STATE3
else:
state = STATE4

""" Check for the end of the numeral. If there are more digits, the numeral is illegal; otherwise, a Roman numeral has been successfully recognized.

def Update0 (length, linePos):
global state
if linePos == length + 1:
state = SUCCESS
else:
state = ERROR

""" Return true exactly when line contains a legal Roman numeral. """

Chapter 5

Durk Jan de Bruin

def IsRoman():
global line
global length
global state
state = STATE3
maxDigits = 3
linePos = 0
while (state != ERROR) and (state != SUCCESS):
if DEBUGGING:
print('*** linePos = {}'.format(linePos),end = '')
if state == STATE3:
print('state = 3, ' end = '')
if state == STATE9:
print('state = 9, ' end = '')
if state == STATE4:
print('state = 4, ' end = '')
if state == STATE8:
print('state = 8, ' end = '')
print('maxDigits = {}'.format(maxDigits))
print(line, length, linePos)
if maxDigits == 0:
Update0(length, linePos)
else:
if state == STATE3:
Update3x (maxDigits, line, length, linePos)
if state == STATE9:
Update9x (maxDigits, line, length, linePos)
if state == STATE8:
Update8x (maxDigits, line, length, linePos)
if state == STATE4:
Update4x (maxDigits, line, length, linePos)
IsRoman = state = SUCCESS
return IsRoman